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<~) Abstract 

We consider linear inverse problems where the solution is assumed to have a sparse expansion on an arbitrary 
pre-assigned orthonormal basis. We prove that replacing the usual quadratic regularizing penalties by weighted 
l p - penalties on the coefficients of such expansions, with 1 < p < 2, still regularizes the problem. If p < 2, 
regularized solutions of such Z p -penalized problems will have sparser expansions, with respect to the basis under 
. consideration. To compute the corresponding regularized solutions we propose an iterative algorithm which amounts 

to a Landweber iteration with thresholding (or nonlinear shrinkage) applied at each iteration step. We prove that 
this algorithm converges in norm. We also review some potential applications of this method. 
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1 Introduction 

1.1 Linear inverse problems 



In many practical problems in the sciences and applied sciences, the features of most interest cannot be observed 
directly, but have to be inferred from other, observable quantities. In the simplest approximation, which works 
surprisingly well in a wide range of cases and often suffices, there is a linear relationship between the features of 
interest and the observed quantities. If we model the object (the traditional shorthand for the features of interest) by 
a function /, and the derived quantities or image by another function h, we can cast the problem of inferring / from 
h as a linear inverse problem, the task of which is to solve the equation 



1^ ■ Kf = h 

o : 

This equation and the task of solving it make sense only when placed in an appropriate framework. In this paper, 
we shall assume that / and h belong to appropriate function spaces, typically Banach or Hilbert spaces, / £ ^object, 
h £ B IKlaE , and that K is a bounded operator from the space 2? 0BJECT to B IKlaE . The choice of the spaces must be appropriate 
for describing real- life situations. 

S The observations or data, which we shall model by yet another function, g, are typically not exactly equal to the 
image h — Kf, but rather to a distortion of h. This distortion is often modeled by an additive noise or error term e, 
i.e. 

g = h+e = Kf + e . 



1/2 

Moreover, one typically assumes that the "size" of the noise can be measured by its L 2 -norm, ||e|| = [J Q |e| 2 ) if e is 

a function on f2. (In a finite-dimensional situation, one uses ||e|| = (Xm=i l e «l 2 ) instead.) Our only "handle" on 
the image h is thus via the observed g, and we typically have little information on e = g — h beyond an upper bound 
on its L 2 -norm |je|j. (We have here implicitly placed ourselves in the "deterministic setting" customary to the Inverse 
Problems community. In the stochastic setting more familiar to statisticians, one assumes instead a bound on the 
variance of the components of e.) Therefore it is customary to take B IKlaE = L 2 (fl); even if the "true images" h (i.e. 
the images Kf of the possible objects /) lie in a much smaller space, we can only know them up to some (hopefully) 
small L 2 -distance. 

We shall consider in this paper a whole family of possible choices for S 0BJECT , but we shall always assume that these 
spaces are subspaces of a basic Hilbert space Tt (often an L 2 -space as well), and that if is a bounded operator from 
H to L 2 (f2). In many applications, K is an integral operator with a kernel representing the response of the imaging 
device; in the special case where this linear device is translation-invariant, K reduces to a convolution operator. 

To find an estimate of / from the observed g, one can minimize the discrepancy A(/), 

A(f) = \\Kf-g\\ 2 ; 
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functions that minimize A(/) are called pseudo-solutions of the inverse problem. If the operator K has a trivial 
null-space, i.e. if N(K) = {/ € H'.Kf = 0} = {0}, there is a unique minimizer, given by / = (K*K)~ 1 K*g, where 
K* is the adjoint operator. If N(K) ^ {0} it is customary to choose, among the set of pseudo-solutions, the unique 
element /t of minimal norm, i.e. f* = arg-min{||/||; / minimizes A(/)}. This function belongs to N(K) 1 - and is 
called the generalized solution of the inverse problem. In this case the map K> : g i— > f> is called the generalized 
inverse of K. Even when K*K is not invertible, K'g is well-defined for all g such that K*g E R(K*K). However, 
the generalized inverse operator may be unbounded (for so-called ill-posed problems) or have a very large norm (for 
ill-conditioned problems). In such instances, it has to be replaced by bounded approximants or approximants with 
smaller norm, so that numerically stable solutions can be defined and used as meaningful approximations of the true 
solution corresponding to the exact data. This is the issue of regularization. 

1.2 Regularization by imposing additional quadratic constraints 

The definition of a pseudo-solution (or even, if one considers equivalence classes modulo ~N(K), of a generalized 
solution) makes use of the inverse of the operator K*K; this inverse is well defined on the range K(K*) of K* when 
K* K is a strictly positive operator, i.e. when its spectrum is bounded below away from zero. When the spectrum of 
K*K is not bounded below by a strictly positive constant, K(K*K ) is not closed, and not all elements of K(K*) lie in 
R(K*K). In this case there is no guarantee that K*g £ K(K*K); even if K*g belongs to K(K*K), the unboundedness 
of (K*K)~ 1 can cause severe numerical instabilities unless additional measures are taken. 

This blowup or these numerical instabilities are "unphysical" , in the sense that we know a priori that the true 
object would not have had a huge norm in 7i, or other characteristics exhibited by the unconstrained "solutions". A 
standard procedure to avoid these instabilities or to regularize the inverse problem is to modify the functional to be 
minimized, so that it incorporates not only the discrepancy, but also the a priori knowledge one may have about the 
solution. For instance, if it is known that the object is of limited "size" in H, i.e. if < p , then the functional to 

be minimized can be chosen as 

A(/) + /i||/||^ = ||^/-5||i 2(a) +HI/ll« 
where /i is some positive constant called the regularization parameter. The minimizer is given by 

f ll = {K*K + iMl)- 1 K*g. (1) 

where / denotes the identity operator. The constant /i can then be chosen appropriately, depending on the application. 
If K is a compact operator, with singular value decomposition given by Kf = a k (f, v k) Uk > where (uk)ken an d 

(^fc)fceN are the orthonormal bases of eigenvectors of KK * and K*K, respectively, with corresponding eigenvalues a\, 
then (J) can be rewritten as 

oo 

= £ (9, u k) v k ■ (2) 

fc=l + l l 

This formula shows explicitly how this regularization method reduces the importance of the eigenmodes of K*K with 
small eigenvalues, which otherwise (if /i = 0) lead to instabilities. If an estimate of the "noise" is known, i.e. if we 
know a priori that g = Kf + e with ||e|| < e, then one finds from (J2J that 

11/ -/J < 

where T(p) — > as — > 0. This means that \x can be chosen appropriately, in an e-dependent way, so that the error 
in estimation ||/ — / M || converges to zero when e (the estimation of the noise level) shrinks to zero. This feature of the 
method, usually called stability, is one that is required for any regularization method. It is similar to requiring that a 
statistical estimator is consistent. 

Note that the "regularized estimate" / M of J2J is linear in g. This means that we have effectively defined a 
linear regularized estimation operator that is especially well adapted to the properties of the operator K; however, it 
proceeds with a "one method fits all" strategy, independent of the data. This may not always be the best approach. 
For instance, if H. is an i 2 -space itself, and K is an integral operator, the functions Uk and Vk are typically fairly 
smooth; if on the other hand the objects / are likely to have local singularities or discontinuities, an approximation of 
type @ (effectively limiting the estimates /„ to expansions in the first N Vk, with N determined by, say, a\ < fi/100 
for fc > N) will of necessity be a smoothened version of /, without sharp features. 

Other classical regularization methods with quadratic constraints may use quadratic Sobolev norms, involving a 
few derivatives, as the "penalty" term added to the discrepancy. This introduces a penalization of the highly oscillating 
components, which are often the most sensitive to noise. This method is especially easy to use in the case where K is 



/"(/, v k ) 



fc=i 



fc=i 



a,. 



(e,u k )v k 
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a convolution operator, diagonal in the Fourier domain. In this case the regularization produces a smooth cut-off on 
the highest Fourier components, independently of the data. This works well for recovering smooth objects which have 
their relevant structure contained in the lower part of the spectrum and which have spectral content homogeneously 
distributed across the space or time domain. However, the Fourier domain is clearly not the appropriate representation 
for expressing smoothness properties of objects that are either spatially inhomogeneous, with varying "local frequency" 
content, and/or present some discontinuities, because the frequency cut-off implies that the resolution with which the 
fine details of the solution can be stably retrieved is necessarily limited; it also implies that the achievable resolution 
is essentially the same at all points (see e.g. the book [2] for an extensive discussion of these topics). 

1.3 Regularization by non-quadratic constraints that promote sparsity 

The problems with the standard regularization methods described above are well known and several approaches have 
been proposed for dealing with them. We propose in this paper a regularization method that, like the classical 
methods just discussed, minimizes a functional obtained by adding a penalization term to the discrepancy; typically 
this penalization term will not be quadratic, but rather a weighted £ p -norm of the coefficients of / with respect to a 
particular orthonormal basis in 7i, with 1 < p < 2. More precisely, given an orthonormal basis (ipy)^ £F of 7i, and 
given a sequence of strictly positive weights w — (w 7 ) 76 r, we define the functional $ w .p by 

d> w , p (/) = A(/) + ^2w 7 \ (/, Vl ) \P = \\Kf gf + $>7l (/. ¥?7> T • (3) 

For the special case p — 2 and w 7 = /i for all 7 € T (we shall write this as w = /iw , where w is the sequence with 
all entries equal to 1), this reduces to the functional QJ. If we consider the family of functional ( E ) A iw ,p(/)j keeping the 
weights fixed at (i, but decreasing p from 2 to 1, we gradually increase the penalization on "small" coefficients (those 
with I (/, ify) I < 1) while simultaneously decreasing the penalization on "large coefficients" (for which | (/, ip^) \ > 1). 
As far as the penalization term is concerned, we are thus putting a lesser penalty on functions / with large but 
few components with respect to the basis (y> 7 ) gr , and a higher penalty on sums of many small components, when 
compared to the classical method of 0. This effect is the more pronounced the smaller p is. By taking p < 2, and 
especially for the limit value p = 1, the proposed minimization procedure thus promotes sparsity of the expansion of 
/ with respect to the <p 7 . (We shall not consider p < 1 here, because then the functional ceases to be convex.) 

The bulk of this paper deals with algorithms to obtain minimizers /* for the functional for general operators 
K. In the special case where K happens to be diagonal in the y? 7 -basis, Kip^ = k 7 (^ 7 , the analysis is easy and 
straightforward. Introducing the shorthand notation / 7 for (/, y 7 ) and g 1 for (g,tp>y), we have then 

$w, P (/) = [K/7 -5 7 | 2 +^7l/7l P ] • 
7er 

The minimization problem thus uncouples into a family of 1-dimensional minimizations, and is easily solved. Of 
particular interest is the almost trivial case where (i) K is the identity operator, (ii) w = /jw and (iii) p = 1, which 
corresponds to the practical situation where the data g are equal to a noisy version of / itself, and we want to remove 
the noise (as much as possible), i.e. we wish to denoise g. In this case the minimizing /* is given by 

/* = EAVv = E^(S>7 ' (4) 
7er 7 er 

where is the (nonlinear) thresholding operator from R to M defined by 

{x + fi/2 if x<-fi/2 
if |x|<m/2 (5) 

x-fx/2 if x>n/2. 

(We shall revisit the derivation of (@J below. For simplicity, we are assuming that all functions are real-valued. If the 
f~j are complex, a derivation similar to that of Q then leads to a complex thresholding operator, which is defined as 
S^re 10 ) = S^{r)e tB ] see Remark ESI below.) 

In more general cases, especially when K is not diagonal with respect to the c/? 7 -basis, it is not as straightforward 
to minimize 

An approach that promotes sparsity with respect to a particular basis makes sense only if we know that the objects 
/ that we want to reconstruct do indeed have a sparse expansion with respect to this basis. In the next subsection 
we list some situations in which this is the case and to which the algorithm that we propose in this paper could be 
applied. 
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1.4 Possible applications for sparsity-promoting constraints 
1.4.1 Sparse wavelet expansions 

This is the application that was the primary motivation for this paper. Wavelets provide orthonormal bases of L 2 (M. d ) 
with localization in space and in scale; this makes them more suitable than e.g. Fourier expansions for an efficient 
representation of functions that have space- varying smoothness properties. Appendix^gives a very succinct overview 
of wavelets and their link with a particular family of smoothness spaces, the Besov spaces. Essentially, the Besov 
space Bp q (R d ) is a space of functions on R d that "have s derivatives in i p (i? d )"; the index q provides some extra 
fine-tuning. The precise definition involves the moduli of continuity of the function, defined by finite differencing, 
instead of derivatives, and combines the behavior of these moduli at different scales. The Besov space q (M. d ) is 
well-defined as a complete metric space even if the indices p, q € (0, oo) are < 1, although it is no longer a Banach 
space in this case. Functions that are mostly smooth, but that have a few local "irregularities" , nevertheless can still 
belong to a Besov space with high smoothness index. For instance, the 1-dimensional function F(x) = sign(a;) e~ x 
can belong to q (R.) for arbitrarily large s, provided < p < (s + i) . (Note that this same example does not 
belong to any of the Sobolev spaces (K) with s > 0, mainly because these can be defined only for p > 1.) Wavelets 
provide unconditional bases for the Besov spaces, and one can express whether or not a function / on M. d belongs to a 
Besov space by a fairly simple and completely explicit requirement on the absolute values of the wavelet coefficients 
of /. This expression becomes particularly simple when p = q; as reviewed in Appendix ^] / <= B^ „(M) if and only if 
(see Appendix [XJ 




where a depends on s,p and is defined by a = s + d ( | — i j , and where |A| stands for the scale of the wavelet "$>\. 
(The i in the formula for a is due to the choice of normalization of the ||^ / a|| Jj2 = !•) For p = q > 1, [[ | s p is an 
equivalent norm to the standard Besov norm on B^ q (Mr); we shall restrict ourselves to this case in this paper. 

It follows that minimizing the variational functional for an inverse problem with a Besov space prior falls exactly 
within the category of problems studied in this paper: for such an inverse problem, with operator K and with the a 
priori knowledge that the object lies in some B^ , it is natural to define the variational functional to be minimized by 

A(/) + 1/1^ = \\Kf g\\ 2 + £ 2^1 1 (/, ¥ A ) |* , 

AeA 

which is exactly of the type <J> WiP (/), as defined in J2J). For the case where K is the identity operator, it was noted 
already in B. that the wavelet-based algorithm for denoising of data with a Besov prior, derived earlier in J7|, amounts 
exactly to the minimization of <& fJ , v , gt i(f), where K is the identity operator and the c/? 7 -basis is a wavelet basis; the 
denoised approximant given in 17. then coincides exactly with |@| |3J) ■ 

It should be noted that if d > 1, and if we are interested in functions that are mostly smooth, with possible jump 
discontinuities (or other "irregularities") on smooth manifolds of dimension 1 or higher (i.e. not point irregularities), 
then the Besov spaces do not constitute the optimal smoothness space hierarchy. For d = 2, for instance, functions / 
that are C°° on the square [0, l] 2 , except on a finite set of smooth curves, belong to B\ 1 ([0, l] 2 ), but not to Bf 1 ([0, l] 2 ) 
for s > 1. In order to obtain more efficient (sparser) expansions of this type of functions, other expansions have to be 
used, using e.g. ridgelets or curvelets (^H], 0)- O ne can then again use the approach in this paper, with respect to 
these more adapted bases. 

1.4.2 Other orthogonal expansions 

The framework of this paper applies to enforcing sparsity of the expansion of the solution on any orthonormal basis. 
We provide here three examples which are particularly relevant for applications, but this is of course not limitative. 

The first example is the case where it is known a priori that the object to be recovered is sparse in the Fourier 
domain, i.e. / has only a few nonzero Fourier components. It makes then sense to choose a standard Fourier basis 
for the tp-y, and to apply the algorithms explained later in this paper. (They would have to be adapted to deal with 
complex functions, but this is easily done; see Remark 12 . 51 below. ) In the case where K is the identity operator, this is 
a classical problem, sometimes referred to as "tracking sinusoids drowned in noise" , for which many other algorithms 
have been developed. 

For other applications, objects are naturally sparse in the original (space or time) domain. Then our framework can 
be used again if we expand such objects in a basis formed by the characteristic functions of pixels or voxels. Once the 
inverse problem is discretized in pixel space, it is regularized by penalizing the i p -norm of the object with 1 < p < 2. 
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Possible applications include the restoration of astronomical images with scattered stars on a flat background. Objects 
formed by a few spikes are also typical of some inverse problems arising in spectroscopy or in particle sizing. In medical 
imaging, ^ p -norm penalization with p larger than but close to 1 has been used e.g. for the restoration of tiny blood 
vessels [25] . 

The third example refers to the case where K is compact and the use of SVD expansions is a viable computational 
approach, e.g. for the solution of relatively small-scale problems or for operators that can be diagonalized in an 
analytic way. As already stressed above, the linear regularization methods as given e.g. by (J2J) have the drawback that 
the penalization or cut-off eliminates the components corresponding to the smallest singular values, independently 
of the type of data. In some instances, the most significant coefficients of the object may not correspond to the 
largest singular values; it may then happen that the object possesses significant coefficient beyond the cut-off imposed 
by linear methods. In order to avoid the elimination of such coefficients, it is preferable to use instead a nonlinear 
regularization analogous to |@|EJ; with basis functions tp~ replaced by the singular vectors v k . The theorems in this 
paper show that the thresholded SVD expansion 

/* = V S /a 2 ( k )v k = V — S* u (er fc (g, u k ))v k , 

which is the minimizer of the functional © with w = /iw and p = 1, provides a regularized solution that is better 
adapted to these situations. 

1.4.3 Frame expansions 

In a Hilbert space TL, a frame {if> n } n eN is a set of vectors for which there exist constants A,B > so that, for all 

v G n, 

B-^Kv^n)] 2 <\\V\\ 2 ^A-^Kv^n)] 2 ■ 

Frames always span the whole space TL, but the frame vectors ip n are typically not linearly independent. Frames were 
first proposed by Duffin and Schaeffer in 18 1; they are now used in a wide range of applications. For particular choices 
of the frame vectors, the two frame bounds A and B are equal; one has then, for all v 6 TL, 

V = A' 1 ( v , i>n) ■ (6) 

In this case, the frame is called tight. An easy example of a frame is given by taking the union of two (or more) 
different orthonormal bases in TL; these unions always constitute tight frames, with A = B equal to the number of 
orthonormal bases used in the union. 

Frames are typically "overcomplete" , i.e. they still span all of TL even if some frame vectors are removed. It follows 
that, given a vector v in TL, one can find many different sequences of coefficients such that 

v = ^2 z n ip n . (7) 

nGN 

Among these sequences, some have special properties for which they are preferred. There is, for instance, a standard 
procedure to find the unique sequence with minimal £ 2 -norm; if the frame is tight, then this sequence is given by 
z n = A^ 1 (v,ip n ), as in ©. 

The problem of finding sequences z = (z„) n gN that satisfy Q can be considered as an inverse problem. Let us 
define the operator K from £ 2 (N) to TL that maps a sequence z = (z„)„ g n to the element Kz of H by 

Then solving JJJ) amounts to solving Kz = v. Note that this operator K is associated with, but not identical to what 
is often called the "frame operator" . One has, for v G TL , 

KK*v = ^ ( w ' i'n) i^n ; 

nGN 

for z E £ 2 , the sequence K*Kz is given by 
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In this framework, the sequence z of minimum ^ 2 -norm that satisfies Q is given simply by z^ = K'v. The standard 
procedure in frame lore for the construction of z^ can be rewritten as = K*(KK*)~ 1 v, so that K> = K*(KK*)^ 1 
in this case. This last formula holds because this inverse problem is in fact well-posed: even though N(K ) ^ {0}, there 
is a gap in the spectrum of K*K between the eigenvalue and the remainder of the spectrum, which is contained 
in the interval [A, B]; the operator KK* has its spectrum completely within [A,I?]. In practice, one always works 
with frames for which the ratio B/A is reasonably close to 1, so that the problem is not only well-posed but also 
well-conditioned. 

It is often of interest, however, to find sequences that are sparser than z* . For instance, one may know a priori 
that v is a "noisy" version of a linear combination of tp n with a coefficient sequence of small € 1 -norm. In this case, it 
makes sense to determine a sequence z M that minimizes 

WKm-vWb+nMe. , (8) 

a problem that falls exactly in the category of problems described in subsection 1.3. Note that although the inverse 
problem for K from ^ 2 (N) to 7i is well-defined, this need not be the case with the restriction K \ gl from £ X (N) to Ti. 
One can indeed find tight frames for which suplHzH^ ; z 6 t 1 and \\Kz\\ < 1} = oo, so that for arbitrarily large R 
and arbitrarily small e, one can find v € H, z E I 1 with \\v — Kz\\ — e, yet inf{||z|| j ; ||5 — Kz\\ < e/2} > i?||z||^i. In 
a noisy situation, it therefore may not make sense to search for the sequence with minimal ^-norm that is "closest" 
to v; to find an estimate of the ^-sequences of which a given v is known to be a small perturbation, a better strategy 
is to compute the minimizer of ©. 

Minimizing the functional JSJ) as an approach to obtain sequences that provide sparse approximations Kz to v was 
proposed and applied to various problems by Chen, Donoho and Saunders J?J; in the statistical literature, least-squares 
regression with ^-penalty is known as the "lasso" |32| . The algorithm in this paper provides thus an alternative to 
linear and quadratic programming techniques for these problems, which all amount to minimizing JHj. 

1.5 A summary of our approach 

Given an operator K from TL to itself (or, more generally, from Ti. to Ti.'), and an orthonormal basis ((^ 7 ) 7 gr, our goal 
is to find minimizing /* for the functionals 'I'w.p defined in section 1.3. The corresponding variational equations are 

V 7 S r : (K*Kf, p 7 ) - {K* 9 , p 7 ) + ^ | (/, <pj ^sign^, <pj) = . (9) 

When p 7^ 2 and K is not diagonal in the c/? 7 -basis, this gives a coupled system of nonlinear equations for the (/, <^ T ), 
which it is not immediately clear how to solve. To bypass this problem, we shall use a sequence of "surrogate" 
functionals that are each easy to minimize, and for which we expect, by a heuristic argument, that the successive 
minimizers have our desired /* as a limit. These surrogate functionals are introduced in section 2 below. In section 
3 we then show that their successive minimizers do indeed converge to /*; we first establish weak convergence, but 
conclude the section by proving that the convergence also holds in norm. In section 4 we show that our proposed 
iterative method is stable, in the sense given in subsection 1.2: if we apply the algorithm to data that are a small 
perturbation of a "true image" Kf Q , then the algorithm will produce /* that converge to f as the norm of the 
perturbation tends to zero. 

1.6 Related work 

Exploiting the sparsity of the expansion on a given basis of an unknown signal, in order to assist in the estimation 
or approximation of the signal from noisy data, is not a new idea. The key role played by sparsity to achieve 
superresolution in diffraction-limited imaging was already emphasized by Donoho |14| more than a decade ago. Since 
the seminal paper by Donoho and Johnstone |17j . the use of thresholding techniques for sparsifying the wavelet 
expansions of noisy signals in order to remove the noise (the so-called "denoising" problem) has been abundantly 
discussed in the literature, mainly in a statistical framework (see e.g. the book |28p. Of particular importance for the 
background of this paper is the article by Chambolle et al. 0, which provides a variational formulation for denoising, 
through the use of penalties on a Besov-norm of the signal; this is the perspective adopted in the present paper. 

Several attempts have been made to generalize the denoising framework to solve inverse problems. To overcome 
the coupling problem stated in the preceding subsection, a first approach is to construct wavelet- or "wavelet-inspired" 
bases that are in some sense adapted to the operator to be inverted. The so-called Wavelet- Vaguelette Decomposition 
(WVD) proposed by Donoho (TJ|, as well as the twin Vaguelette- Wavelet Decomposition method 1 , and also the 
deconvolution in mirror wavelet bases 22 , 28 can all be viewed as examples of this strategy. For the inversion of the 
Radon transform, Lee and Lucier formulated a generalization of the WVD decomposition that uses a variational 
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approach to set thresholding levels. A drawback of these methods is that they are limited to special types of operators 
K (essentially convolution- type operators under some additional technical assumptions). 

Other papers have explored the application of Galerkin-type methods to inverse problems, using an appropriate 
but fixed wavelet basis |KSI 1271 U|| • The underlying intuition is again that if the operator lends itself to a fairly sparse 
representation in wavelets, e.g. if it is an operator of the type considered in 0], and if the object is mostly smooth 
with some singularities, then the inversion of the truncated operator will not be too onerous, and the approximate 
representation of the object will do a good job of capturing the singularities. In 9 , the method is made adaptive, so 
that the finer-scale wavelets are used where lower scales indicate the presence of singularities. 

The mathematical framework in this paper has the advantage of not pre-supposing any particular properties for 
the operator K (other than boundedness) or the basis (</? 7 ) 7 6r (other than its orthonormality) . We prove, in complete 
generality, that generalizing Tikhonov's regularization method from the £ 2 -penalty case to a ^-penalty (or, more 
generally, a weighted £ p -penalty with 1 < p < 2) provides a proper regularization method for ill-posed problems 
in a Hilbert space Ti, with estimates that are independent of the dimension of Ti (and are thus valid for infinite- 
dimensional separable Ti). To our knowledge, this is the first proof of this fact. Moreover, we derive a Landweber-type 
iterative algorithm that involves a denoising procedure at each iteration step and provides a sequence of approximations 
converging in norm to the variational minimizer, with estimates of the rate of convergence in particular cases. This 
algorithm was derived previously in jlOj . using, as in this paper, a construction based on "surrogate functionals" . 
During the final editing of the present paper, our attention was drawn to the independent work by Figueiredo and 
Nowak |21l \'M\\ . who, working in the different (finite-dimensional and stochastic) framework of Maximum Penalized 
Likelihood Estimation for inverting a convolution operator acting on objects that are sparse in the wavelet domain, 
derive essentially the same iterative algorithm as in |10j and this paper. 



2 An iterative algorithm through surrogate functionals 

It is the combined presence of K* K f (which couples all the equations) and the nonlinearity of the equations that 
makes the system ® unpleasant. For this reason, we borrow a technique of optimization transfer (see e.g. |23] and 
and construct surrogate functionals that effectively remove the term K*K f . We first pick a constant C so that 
< C, and then we define the functional S(/;a) = C\\f — a\\ 2 — \\K f — Ka\\ 2 which depends on an auxiliary 
element a of Ti. Because CI— K*K is a strictly positive operator, S(/; a) is strictly convex in / for any choice of a. If 
\\K\\ < 1, we are allowed to set C = 1; for simplicity, we will restrict ourselves to this case, without loss of generality 
since K can always be renormalized. We then add S(/; a) to < I ) w.p(/) to form the following "surrogate functional" 



*w, P (/;o) 



$w, P (/)-ii^/-^aii 2 + n/- a i: 



\Kf-g\\ 2 + J2w~f\(L 



tp 7 ) 



7 er 



\Kf-Ka\\ 2 + \\f-a\\ 



= Il/H 2 - 2 (/, a + K*g - K*Ka) + ]T w,\ (/, Vl ) |" + || 5 || 2 



lall 2 - ll^all 2 



]T [f* - 2/ 7 (a + K*g - K*Ka), ( + w y \f^ 



|a|| 2 -||X a || 2 



(10) 



where we have again used the shorthand w 7 for (v,ip y ), and implicitly assumed that we are dealing with real functions 
only. Since S(/;a) is strictly convex in /, <3> w (/;a) is also strictly convex in /, and has a unique minimizer for any 
choice of a. The advantage of minimizing (|10|) in place of © is that the variational equations for the / 7 decouple. 
We can then try to approach the minimizer of $ w ,p(/) by an iterative process which goes as follows: starting from 
an arbitrarily chosen / , we determine the minimizer f 1 of (|1(J|) for a = / ; each successive iterate /" is then the 
minimizer for / of the surrogate functional i|10[l anchored at the previous iterate, i.e. for a — f n ~ 1 . The iterative 
algorithm thus goes as follows 



f arbitrary ; /" = arg-min ($^(/; /" r ) 



n = 1,2, 



(11) 



To gain some insight into this iteration, let us first focus on two special cases. 

In the case where w = (i.e. the functional <I> w .p reduces to the discrepancy only), one needs to minimize 

<?(/; ^) = H/ll 2 - 2 </> ^ + K *te - K f n ^)) + llffll 2 + lir^ll 2 - Wkt-T ; 



this leads to 



K*{g-Kr- 1 ) 
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This is nothing else than the so-called Landweber iterative method, the convergence of which to the (generalized) 
solution of Kf = g is well-known (12111; see also [2], |2l)|). 

In the case where w = /iw and p = 2, the n-th surrogate functional reduces to 



<2(/; P- X ) = (1 + M) ll/ll 2 - 2 </, + K*(g - Kf*- 1 )) + hf + Wf^W 2 - WKf^f ; 
the minimizer is now 

r = -^—[p- 1 +K*{g-Kp- 1 )] , (12) 
1 + n 

i.e. we obtain a damped or regularized Landweber iteration (see e.g. [2]). The convergence of the function f n defined 
by (H2J follows immediately from the estimate = (l+^IKl- A*A)(/ Tl -/"- 1 )|| < (I+m) -1 !!/"-/"" 1 !!, 

showing that we have a contractive mapping, even if N(A') ^ {0}. 

In these two special cases we thus find that the /" converge as n — > oo. This permits one to hope that the /" will 
converge for general w,p as well; whenever this is the case the difference ||/ n — / n_1 || 2 — \\K(f n — /" _1 )|| 2 between 
®w p(/ n ! / n_1 ) an( i < i ) w.p(/™) tends to zero asm oo, suggesting that the minimizer /" for the first functional could 
well tend to a minimizer /* of the second. In section 3 we shall see that all this is more than a pipe-dream; i.e. we 
shall prove that the /" do indeed converge to a minimizer of ^w.p. 

In the remainder of this section, we derive an explicit formula for the computation of the successive We first 
discuss the minimization of the functional IjlUfl for a generic a £ TL. As already noticed, the variational equations for 
the / 7 decouple. For p > 1, the summand in (|10|l is differentiable in / 7 , and the minimization reduces to solving the 
variational equation 

2/ 7 +p W7 sign(/ 7 )|/ 7 r 1 = 2(a 7 + [K*(g - Aa)] 7 ) ; 

since for any w > and any p > 1, the real function F w<p (x) = x + ^ sign(a;)|a;| p ~ 1 is a one-to-one map from R to 
itself, we thus find that the minimizer of (|10|) satisfies 

f- ( = S w ^. p (a- ( + [K*(g - Ka)} 7 ) , (13) 

where S WiP is defined by 

S w ^ P = (F w ^ p y 1 , forp>l. (14) 

When p = 1, the summand of I|1U|) is differentiable in f 1 only if f 1 ^ 0; except at the point of non-differentiability, 
the variational equation now reduces to 

2/ 7 + u h sign(/ 7 ) = 2(a 7 + [K*(g - Aa)] 7 ) . 

For / 7 > 0, this leads to / 7 = a 7 + [K*{g — Aa)] 7 — w 7 /2; for consistency we must impose in this case that 
a^ + [K* (g— Ka)]j > w 1 /2. For / 7 < 0, we obtain / 7 = a y + {K*(g—Ka)} y +w 1 /2, valid only when a 1 + [K*(g—Ka)} 1 < 
— w 7 /2. When a 7 + [K*(g — A"a)] 7 does not satisfy either of the two conditions, i.e. when |a 7 + [K*(g — Aa)] 7 | < w 7 /2, 
we put / 7 = 0. Summarizing, 

A = S w , iA (a 7 + [K*(g - Aa)] 7 ) , (15) 
where the function S Wt i from R to itself is defined by 

( x-w/2 if x>w/2 
S w ,i(x) = I if |x| < w/2 (16) 
{ x + w/2 if x < -w/2 . 

(Note that this is the same nonlinear function as encountered earlier in section 1.3, in definition (0.) 

The following proposition summarizes our findings, and proves (the case p = 1 is not conclusively proved by the 
variational equations above) that we have indeed found the minimizer of p (f; a): 

Proposition 2.1 Suppose the operator A maps a Hilbert space TL to another Hilbert space TL 1 , with || A*A"|| < 1, and 
suppose g is an element oj TL' . Let (i^ 7 ) 7g r be an orthonormal basis for TL, and let w = (w 7 ) 7e r be a sequence of 
strictly positive numbers. Pick arbitrary p > 1 and a G TL. Define the functional $ w p (f;a) on TL by 

<;(/; a) = || Kf - gf + £ w,\f,\ P + 11/ - «!| 2 " \\K(f ~ a)|| 2 . 
Then 3> w p (f; a) has a unique minimizer in TL. 

This minimizer is given by f = S WjP (a + K*(g — Ka)), where the operators S WiP are defined by 

S w , p (h) = ^ Sw^Ahi)^! > ( 17 ) 



s 



with the functions S WtP from K to itself given by (|14llltj|) . For all h G Ti, one has 

^ p (f + h;a)>^ p (f;a) + \\h\\ 2 . 

Proof: The cases p > 1 and p — I should be treated slightly differently. We discuss here only the case p = 1; the 
simpler case p > 1 is left to the reader. 

Take /' = / + /i, where / is as defined in the Proposition, and h G Ti is arbitrary. Then 

+ A; a) = <*(/; a) + 2(h,f-a- K*(g - Ka)) + £ w 7 (|/ 7 + fc 7 | - |/ 7 |) + \\hf . 

Define now r o = {7 G T; / 7 = 0}, and T 1 =T\T . Substituting the explicit expression Ijl5(l for the / 7 , we have then 

+ h;a)- <i(/;a) = ||^|| 2 + £ K|h 7 | - 2fr 7 (a 7 + [K*(g - Ka)} 7 )} 

7er 

+ ( w f\fy + ^7 1 ~ w i\fi\ + h 1 [-w 1 si S n (/7)]) ■ 

76^ 

For 7 G r 0) 2|a 7 + [if* (5 - lfa)] 7 | < io 7 , so that iy 7 |/i 7 | - 2/i 7 (a 7 + [JC*(g - Ka)}^) > 0. 
If 7 G I\, we distinguish two cases, according to the sign of / 7 . If / 7 > 0, then 

w i\fi + h-y\ ~ ^7 l/y I + h-y[~ w i s ig n (/7)] = w i\\,fi + h-y\ ~ (ft + h~r)] ^ 0- If ft < 0j then iw 7 |/ 7 + /i 7 | — u> 7 |/ 7 | + 
h^[-w 1 sign(/ 7 )] = io 7 [|/ 7 + h 7 \ + (/ 7 + ft 7 )] > 0. 

It follows that $ w ± (f + h;a) — $ w a) > \\h\\ 2 , which proves the Proposition. | 

For later reference it is useful to point out that 
Lemma 2.2 The operators S WtP are non-expansive, i.e. Vi>, v' G Ti, 1 1 S W:P f — S WiP z/|| < \\v — v'\\ . 
Proof: As shown by (|17fl . 

||S W! pU — S w .p« || 2 = S ' |iS tU7) p(u 7 ) — >Stu 7 ,p(i> 7 )P , 
7er 

which means that it suffices to show that, Vx, x' G M, and all w > 0, 

|S™, P (x) - S^x')! < |x-x'| . (18) 

If p > 1, then 5m )P is the inverse of the function F WyP ; since is differcntiable with derivative uniformly bounded 
below by 1, l|18|) follows immediately in this case. 

If p = 1, then S w .\ is not differcntiable in x = w/2 or x = —w/2, and another argument must be used. For the 
sake of definiteness, let us assume x > x'. We will just check all the possible cases. If x and x' have the same sign 
and |x|, \x'\ > w/2, then \S w<p (x) — S w>p (x')\ = \x — x'\. If x' < —w/2 and x > w/2, then \S WiP (x) — S w , p (x')\ — 
x + \x'\ — w < \x — x'\. If x > w/2 and \x'\ < w/2, then \S WiP (x) — S W;P (x')\ — x — w/2 < \x — x'\. A symmetric 
argument applies to the case \x\ < w/2 and x' < —w/2. Finally, if both |x| and |x'| are less than w/2, we have 
\Sw,p(%) — S w . p (x')\ — < |x — x'\. This establishes (118(1 in all cases. | 



Having found the minimizer of a generic $ w p (f; a), we can apply this to the iteration l|lljl. leading to 

Corollary 2.3 Let Ti, Ti' , K , g, w and (y 7 ) 7 gr be as in Provosition \2.1\ Pick f° in Ti, and define the functions f n 
recursively by the algorithm (lllfl . Then 

f n = S v/tP (r- 1 +K*(g-Kf n - 1 )) . (19) 

Proof: this follows immediately from Proposition ^. II | 



Remark 2.4 In the argument above, we used essentially only two ingredients: the (strict) convexity of |j/ — a\\ 2 — 
\\K(f — a)\\ 2 , and the presence of the negative — ||if/|| 2 term in this expression, canceling the ||if/|| 2 in the original 
functional. We can use this observation to present a slight generalization, in which the identity operator used to upper 
bound K*K is replaced by a more general operator D that is diagonal in the <p 7 -basis, 

D ifj = d 1 ip 1 , 



9 



and that still gives a strict upper bound for K*K, i.e. satisfies 

D > K*K + rjl for some r] > . 
In this case, the whole construction still carries through, with slight modifications; the successive /" are now given 

by 

,„ _ „ ( ,„-i , [K*{g-Kr-^ \ 

J j ^iv y /dy,p yJj t ^ y • 

Introducing the notation w/d for the sequence (w 7 /d 7 ) 7 , we can rewrite this as 

/" - S wMp + D- l [K*(g - KP- 1 )]) . 

For the sake of simplicity of notation, we shall restrict ourselves to the case D = I. 

Remark 2.5 If we deal with complex rather than real functions, and the / 7 , (K*g) 1 ,- ■ ■ are complex quantities, 
then the derivation of the minimizer of $ w \(f;a) has to be adapted somewhat. Writing / 7 = r 1 e t0 '< , with r 7 > 0, 
6> 7 S [0, 2tt), and likewise (a + - K*Ka) 1 = R 1 e ie ~< , we find, instead of (fTTTf) - 

<" P (/; «) - + " 2r 7 i? 7 cos(0 7 - 9 7 )] + ||.g|| 2 + ||a|| 2 - ||iva|| 2 . 

7 

Minimizing over r 7 S [0, c») and # 7 G [0, 27r) leads to # 7 = 7 and r 7 = iSu, iP (i? 7 ). If we extend the definition of 
to complex arguments by setting S , AtjP (re l6 ') = S , MiP (r)e ie , then this still leads to / 7 = S Wj , p (a 7 + [K*(g — Ka)) 7 ), as 
in 1131 115(1 . The arguments of the different proofs still hold for this complex version, after minor and straightforward 
modifications. 



3 Convergence of the iterative algorithm 

In this section we discuss the convergence of the sequence (/ n )neN defined by (|19J) . The main result of this section is 
the following theorem: 

Theorem 3.1 Let K be a bounded linear operator from Ti. to TC , with norm strictly bounded by 1. Take p € [1,2], 
and let S w p be the shrinkage operator defined by H17|) . where the sequence w = (w) 7 ) 7 gr * s uniformly bounded below 
away from zero, i.e. there exists a constant c > such that V7 € T : w~ ( > c. Then the sequence of iterates 

f n = S w , p {r- 1 + K*(g-Kf n - 1 )) , n = l,2,..., 

with f arbitrarily chosen in Ti, converges strongly to a minimizer of the functional 



^M) = \\Kf-9\\ 2 + 



II w.p 



whe 



denotes the 



l|w,p 



7er 



^i</^7> r 



i/p 



1 < p < 2 



(20) 



// either p > 1 or N(_ftT) = {0} ; then the minimizer f* of $ w ,p is unique, and every sequence of iterates f n converges 
strongly to /*, regardless of the choice of f° . 

By "strong convergence" we mean convergence in the norm of Ti., as opposed to weak convergence. This theorem 
will be proved in several stages. To start, we prove weak convergence, and we establish that the weak limit is indeed a 
minimizer of $ WiP . Next, we prove that the convergence holds in norm, and not only in the weak topology. To lighten 
our formulas, we introduce the shorthand notation 

Tf = S w , p (f + K*(g-Kf)) ; 

with this new notation we have /" = T n /°. 
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3.1 Weak convergence of the f n 

To prove weak convergence of the /" = T™/ , we apply the following theorem, due to Opial : 
Theorem 3.2 Let the mapping A from TL to TL satisfy the following conditions: 

(i) A is non- expansive: \fv,v' £ TL, ||Au — Au'|| < \\v — v'\\, 

(ii) A is asymptotically regular: \fv £ TL, ||A ,l+1 u — A n w|| > , 

n — >oo 

(iii) the set T of the fixed points of A in TL is not empty. 

Then, Vw £ TL, the sequence (A n v) ne jq converges weakly to a fixed point in T . 

Opial's original proof can be simplified; we provide the simplified proof (still mainly following Opial's approach) 
in Appendix [5] (The theorem is slightly more general than what is stated in Theorem 13.21 in that the mapping A 
need not be defined on all of space; it suffices that it map a closed convex subset of TL to itself - see Appendix [5] 
pTTj also contains additional refinements, which we shall not need here.) One of the Lemmas stated and proved in the 
Appendix will be invoked in its own right, further below in this section; for the reader's convenience, we state it here 
in full as well: 

Lemma 3.3 Suppose the mapping A from TL to TL satisfies the conditions (i) and (ii) in Theorem 13.21 Then, if a 
subsequence of (A"w) ng N converges weakly in TL, then its limit is a fixed point of A. 

In order to apply Opial's Theorem to our nonlinear operator T, we need to verify that it satisfies the three 
conditions in Theorem 13. 21 We do this in the following series of lemmas. We first have 

Lemma 3.4 The mapping T is non- expansive, i. e. \fv,v' £ TL 

\\Tv-Tv'\\ < ||v-u'|| • 

Proof: It follows from Lemma 12 . 21 that the shrinkage operator S w p is non-expansive. Hence we have 

||Tu-Tu'|| < \\(I-K*K)v-(I-K*K)v'\\ 
< \\I-K*K\\ \\v-v'\\ < \\v-v'\\ 

because we assumed ||-£T|| < 1. | 

This verifies that T satisfies the first condition (i) in Theorem 13.21 To verify the second condition, we first prove 
some auxiliary lemmas. 



Lemma 3.5 Both ($ w , P (/"))„ eN and (<™(/ B+1 ;/ n ) 



are non-increasing sequences. 



Proof: For the sake of convenience, we introduce the operator L = vT— K*K, so that \\h\\ 2 — \\Kh\\ 2 = \\Lh\\ 
Because f n+1 is the minimizer of the functional <i> w p (f; f n ) and therefore 

<wr +1 ) + \m n+l - mi 2 = Or +1 ; /") ^ <C(/ n ; f n ) - wn , 

we obtain 

<wr +1 ) < $w,p(D . 

On the other hand 

<p(/ n+2 ; r +1 ) < <fw,p(r +1 ) < <wr +1 ) + \\L(r +i - mi 2 = <Or +i ; n . 



Lemma 3.6 Suppose the sequence w = (w 7 ) 76 r is uniformly bounded below by a strictly positive number. Then the 
\\f n \\ are bounded uniformly in n. 
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Proof: Since u> 7 > c, uniformly in 7, for some c > 0, we have 

by Lemma 13.51 Hence the /" are bounded uniformly in the ||| | WjP -norm. Since 



ll/H 2 < c-^ m|x[ W ( 2 ^|/ 7 M 11/11^^ < c-^lff-P I/I^ p = c-^l/ll^ , (21) 
we also have a uniform bound on the ||/™||. | 



Lemma 3.7 The series J2^=o — /"II 2 * s convergent. 

Proof: This is a consequence of the strict positive-definiteness of L, which holds because \\K\\ < 1. We have, for any 

jVeN, 

TV TV 
£ ||r+ l _ r|| 2 < £ mf n+l /n) ||2 

n=0 n=0 

where v4 is a strictly positive lower bound for the spectrum of By Lemma 13.51 

TV TV 

E H L (/ n+1 - /")n 2 ^ Etwn - wr +1 )] = - w/^ 1 ) < <w/°) , 

n=0 n=0 

where we have used that ($ w ,p(/"))nGN is a non-increasing sequence. 

It follows that Yl n =o ll/" +1 — /™l| 2 i s bounded uniformly in N, so that the infinite series converges. | 

As an immediate consequence, we have that 
Lemma 3.8 The mapping T is asymptotically regular, i.e. 

||T n+1 /°-T n /°|| = ||/" +1 -/l ^0 for n->oo. 
We can now establish the following 

Proposition 3.9 The sequence f n — T"/°, n = 1, 2, • • ■ converges weakly, and its limit is a fixed point for T. 

Proof: Since, by Lemma l3~Hl the /" = T"/° are uniformly bounded in n, it follows from the Banach-Alaoglu theorem 
that they have a weak accumulation point. By Lemma |3. 31 this weak accumulation point is a fixed point for T. It 
follows that the set of fixed points of T is not empty. Since T is also non-expansive (by Lemma EPJl and asymptotically 
regular (by Lemma 13. 8|) , we can apply Opial's theorem (Theorem 13.11 above) , and the conclusion of the Proposition 
follows. I 

By the following proposition this fixed point is also a minimizer for the functional $ w ,p- 
Proposition 3.10 A fixed point for T is a minimizer for the functional "Jw.p- 

Proof: If /* = T/*, then by Proposition l2.il we know that /* is a minimizer for the surrogate functional <E> W p (f; /*), 
and that, VhEH, 

<Or +h-,n> Or ; n + \\h\\ 2 . 



Observing that $ W)P (/*; /*) = $ w>p (/*), and 

<Or + h-, n = <wr + h ) + ii^ii 2 - ira 2 . 

we conclude that, \/h 6 H, < & w ,p(/* + h) > $ w , p (/*) + ||-ft^|| 2 , which shows that /* is a minimizer for $(/). | 

The following proposition summarizes this subsection. 

Proposition 3.11 (Weak convergence) Make the same assumptions as in the statement of Theorem 13.11 Then, for 
any choice of the initial f , the sequence f n = T n / , n = 1, 2, • • • converges weakly to a minimizer for Q-w.p. If either 
N(K) — {0} or p > 1, then ^ w , p has a unique minimizer f* , and all the sequences (f n ) n eN converge weakly to f* , 
regardless of the choice of f° . 
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Proof: The only thing that hasn't been proved yet above is the uniqueness of the minimizer if N(K) = {0} or p > 1. 
This uniqueness follows from the observation that |||/||| w ,p is strictly convex in / if p > 1, and that \\Kf — g\\ 2 is strictly 
convex in / if N(A") = {0}. In both these cases $ w ,p is thus strictly convex, so that it has a unique minimizer. g 

Remark 3.12 If one has the additional prior information that the object lies in some closed convex subset C of the 
Hilbert space H,, then the iterative procedure can be adapted to take this into account, by replacing the shrinkage 
operator S by PcS, where Pc is the projector on C. For example, if Ti. = L 2 , then C could be the cone of functions that 
are positive almost everywhere. The results in this subsection can be extended to this case; a more general version of 
Theorem 13.21 can be applied, in which A need not be defined on all of Tt, but only on C C H; see Appendix IB1 Wc 
would however need to either use other tools to ensure, or assume outright that the set of fixed points of T = P^S is 
not empty (see also 



Remark 3.13 If is strictly convex, then one can prove the weak convergence more directly, as follows. By the 
boundedness of the /" (Lemma I3.6[l . we must have a weakly convergent subsequence (f Uk )kefi- By Lemma 13.81 the 
sequence (f nk+1 )keN must then also be weakly convergent, with the same weak limit /. It then follows from the 
equation 

/ 7 " fc+1 = S w ^ p (/»» + [K*(g - Kr h )]-y) , 

together with lim^oo f" k = lim^oo /™ fc+1 = / 7 , that / must be the fixed point /* of T. Since this holds for any 
weak accumulation point of (/™)neN) the weak convergence of (f n ) n &i to /* follows. 

Remark 3.14 The proof of Lemma [3.61 is the only place, so far, where we have explicitly used p < 2. If it were 
possible to establish a uniform bound on the ||/"|| by some other means (e.g. by showing that the ||T n /°|| are 
bounded uniformly in n), then we could dispense with the restriction p < 2, and Proposition 13 . 1 II would hold for all 
p > 1. 

3.2 Strong convergence of the f n 

In this subsection we shall prove that the convergence of the successive iterates {/"} holds not only in the weak 
topology, but also in the Hilbert space norm. Again, we break up the proof into several lemmas. For the sake of 
convenience, we introduce the following notations 

/* = w- lim /" 

n— >oo 

u n = f n_ r 

h = f* + K*(g-Kf). (22) 
Here and below, we use the notation w-lim as a shorthand for weak limit. 
Lemma 3.15 ||ifu n || — > for n — > oo . 
Proof: Since 

u n+1 -u n = S w . p (h + (I- K*K)u n ) - S w , p (/i) - u n 
and \\u n+1 - u n \\ = \\f n+1 - f n \\ -> for n -> oo by Lemma we have 

II S w . p (h + (7 - K*K)u n ) - S w>p (/i) - u n || -» for n -» oo , (23) 

and hence also 

max(0,||u n ||-||S W!P (/i+(I-7f*ir)u™)-S w ,p(/i)|| ) -> for n -> oo . (24) 
Since S WiP is non-expansive (Lemma 12. 2|1 . we have 

II S W:P (h+(I-K*K)u n )-S w , p (h)\\ < \\{I-K*K)u n \\ < \\u n \\ ; 

therefore the "max" in (|24|l can be dropped, and it follows that 

||« n || - - K* K)u n \\ -> for n-* oo . (25) 
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Because 

\\u n \\ + \\(I-K*K)u n \\ < 2||u n ||=2||/"-/*|| 

< 2(||.r||+sup||/ fc ||) = C 

k 

where C is a finite constant (by Lemma I3.6J1 . we obtain 

< \\u n f K*K)u n f < C(\\u n \\ - \\{I-K*K)u n \\) , 
which tends to zero by (|25|l . The inequality 

||u™|| 2 - - K*K)u n \\ 2 = 2\\Ku n \\ 2 - \\K*Ku n \\ 2 > \\Ku n \\ 2 
then implies that ||i^M n || 2 — * for n — *■ oo . | 

Remark 3.16 Note that if K is a compact operator, the weak convergence to of the u n automatically implies that 
H-KTMnll tends to as n tends to oo, so that we don't need Lemma 13.151 in this case. 



If K had a bounded inverse, we could conclude from ||iTu n || — > that ||u„|| — > for n — > oo . If this is not the case, 
however, and thus for all ill-posed linear inverse problems, we need some extra work to show the norm convergence of 

r to /*. 

Lemma 3.17 For h given by ||S W)P (/i + u") - S wp (/i) - u n \\ — > for n — > oo. 
Proof: We have 

||S W)P (ft + u n )-S wd ,(/i)-« n || < ||S WJ ,(ft + u T, -Jf*Jfu n )-S W)P (ft) 

+ ||S W)P (/i + u n ) - S w ,p(/l + u n 
< ||S w , p (/i + u n -iTl!ru n )-S w , p (/i) 
+ \\K*Ku n \\ , 

where we used the non-expansivity of S w . p f Lemma 12 .20 . The result follows since both terms in this last bound tend 
to zero for n — > oo because of Lemma Ft. 151 and (j2HJ). | 



-K*Ku n )\ 
-u n \\ 



Lemma 3.18 If for some a G H, and some sequence (v n ) n ^, w-linin^oo v n = and linin^oo ||S w , p (a + v n ) — 
S w ,p(a) — w ™| =0 then \\v n \\ — > for n — > oo. 

Proof: The argument of the proof is slightly different for the cases p = 1 and p > 1, and we treat the two cases 
separately. 

We start with p > 1. Since the sequence {v n } is weakly convergent, it has to be bounded: there is a constant B such that 
Vn, ||«"|| < B, and hence also Vn,V7 G T, < B. Next, we define the set r o = {7 G T; |a 7 | > B}; since a G 7Y, this is 
a finite set. We then have V7 G T 1 = T\T a , that |o 7 | and |a 7 + i> 7 | are bounded above by IB. Recalling the definition of 

S w ^ p = (F w ^ p y\ and observing that, because p < 2, F^ y>p (x) = 1 + w 7 p(p~ 1)\x\p- 2 /2 > 1 + w 7 p(p- 1)/[2(2B) 2 -p] 
if |x| < 2B , we have 

< (1 + U ; 7 KP-1)/[2(2 J B) 2 -P])- 1 |<| 

< (l + cp(p-l)/[2(2 J B) 2 ^])- 1 |<| ; 

in the second inequality, we have used that ISm p (a;)| < |x|, a consequence of the non-expansivity of S Wj , p (see Lemma 
12.2(1 to upper bound the derivative S' w p on the interval [—2B,2B] by the inverse of the lower bound for F^ >p on 
the same interval; in the last inequality we used the uniform lower bound on the w 1 , i.e. V7, w 1 > c > 0. Rewriting 
(1 + cp(p - 1)/[2(2B) 2 -p]) _1 = C < 1, we have thus, V7 G T 1 , C>»| > \S w ^ p (a 7 + u») - S™ T , P (a 7 )|, which implies 

J] Kl 2 < fl _L 2 I] l< - S ™-,A a i + <) + S w , n Mi)? -» as n -> 00 . 
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On the other hand, since T is a finite set, and the v n tend to zero weakly as n tends to oo, we also have 

2_\. W-y\ 2 — * oo as rc. — *■ oo . 

This proves the proposition for the case p > 1 . 

For p = 1, we define a finite set T C T so that X) 7 er\r Ic-y | 2 < ( c /4) 2 , where c is again the uniform lower bound on 

the w-y. Because this is a finite set, the weak convergence of the v n implies that XXpr \ v ^y\ 2 * Oj so that we can 

concentrate on X) 7 er\r I"™! 2 om y- 

For each n, we split T 1 = T \ T into two subsets: I\ = {7 G I\; |«" + a 7 | < w 1 /2} and r x = 1^ \ T x n . If 7 G T 1 n , 
then ^,1(0^ + w") = ^,1(0^.) = (since |a 7 | < c/4 < w 7 /2), so that |v™ - i(a 7 + V™) + S WT ,i(a 7 )| = It 
follows that 

E Kl* < E K " + "7) + ^,i(a 7 )| 2 ^Oasn^co. 

7er 1 „ 7 er 

It remains to prove only that the remaining sum, X) 7 ef \ v j\ 2 a ^ so tends to as n — * 00. 

If 7 G I\ and |w™ + a 7 | > w 7 /2, then > |k™ + a 7 | — |a 7 | > w 7 /2 — c/4 > c/4 > |a 7 |, so that u" + a 7 and have 
the same sign; it then follows that 

|f™ - ^,i(a 7 + «") + S^,i(a 7 )| = |t>" - S w ^,i(a 7 + 

= |< - (a 7 + <) + ^sign«)| > ^ - |a 7 | > \ . 

This implies that 

since ||w" — S w i(a + w") + S w i(a)|| ► 0, we know on the other hand that 

E l<-^,l(«7+<)+^,l(«7)| 2 < (£) 

7er lin 

when n exceeds some threshold, N, which implies that T ± n is empty when n > N. Consequently S 7 er l w 7 | 2 = 
for n > N. This completes the proof for the case p = 1. | 

Combining the Lemmas in this subsection with the results of the previous subsection gives a complete proof of 
Theorem 13. II as stated at the start of this section. 

4 Regularization properties and stability estimates 

In the preceding section, we devised an iterative algorithm that converges towards a minimizer of the functional 

W/) = ||#/-# + I/K,, p ■ (26) 

For simplicity, let us assume, until further notice, that either p > 1 or N(K) = {0}, so that there is a unique minimizer. 

In this section, we shall discuss to what extent this minimizer is acceptable as a regularized solution of the (possibly 
ill-posed) inverse problem Kf = g. Of particular interest to us is the stability of the estimate. For instance, if 
N(_ftT) = {0}, we would like to know to what extent the proposed solution, in this case the minimizer of 'I'w^, deviates 
from the ideal solution f a if the data are a (small) perturbation of the image Kf a of f a . (If N(K) 7^ {0}, then there 
exist other / that have the same image as f a , and the algorithm might choose one of those - see below.) In this 
discussion both the "size" of the perturbation and the weight of the penalty term in the variational functional, given 
by the coefficients (u> 7 ) 7 gr, play a role. We argued earlier that we need w ^ in order to provide a meaningful 
estimate if e.g. if is a compact operator; on the other hand, if g = Kf Q , then the presence of the penalty term will 
cause the minimizer of 'I'w^ to be different from f a . We therefore need to strike a balance between the respective 
weights of the perturbation g — Kf a and the penalty term. Let us first define a framework in which we can make this 
statement more precise. 
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Because we shall deal in this section with data functions g that are not fixed, we adjust our notation for the 
variational functional to make the dependence on g explicit where appropriate: with this more elaborate notation, the 
right hand side of, for instance, 11201) is now ^w.p;^/)- (Because we work with one fixed operator K, the dependence 
of the functional on K remains "silent".) In order to make it possible to vary the weight of the penalty term in the 
functional, we introduce an extra parameter fj,. We shall thus consider the functional 

t> m Af) = \\Kf-g\\ 2 + mil, P • (27) 

Its minimizer will likewise depend on all these parameters. In its full glory, we denote it by /^ wp . s ; when confusion 
is impossible we abbreviate this notation. In particular, since w and p typically will not vary in the limit procedure 
that defines stability, we may omit them in the heat of the discussion. Notice that the dependence on w and /i arises 
only through the product /imv. 

As mentioned above, if the "error" e = g — Kf a tends to zero, we would like to see our estimate for the solution 
of the inverse problem tend to / Q ; since the minimizer of < I ) M ,w,p;g(/) differs from f if /j, ^ 0, this means that we shall 
have to consider simultaneously a limit for ji — ► 0. More precisely, we want to find a functional dependence of fx on 
the noise level e, /i = fx(e) such that 

t>0 and sup \\f* (e)tV/ , p . g - f a \\ — 7*0. (28) 

\\a-i<f \\<c y ' 

for each f in a certain class of functions. If we can achieve this, then the ill-posed inverse problem will be regularized 
(in norm or "strongly" ) by our iterative method, and /* w will be called a regularized solution. One also says in 
this case that the minimization of the penalized least-squares functional (|26(l provides us with a regularizing algorithm 
or regularization method. 



4.1 A general regularization theorem 

If the w 7 tend to oo, or more precisely, if 

VC* > : #{7 S T; w 7 < C} < 00 , (29) 

then the embedding of £> W , P = {/ € H; X) 7 er w t\ft\ P < °°} m ^ ^ s compact. (This is because the identity operator 
from £> w . p to H is then the norm-limit in C(B WiP ,H), as C — ► 00, of the finite rank operators Pc defined by Pcf — 
X)-ygrn w t (fi ft) ff wriere Tc = {7 £ T; w 7 < C}.) In this case, general compactness arguments can be used to show 
that Ij28(l can be achieved. (See also further below.) We are, however, also interested in the general case, where the 
w~f need not grow unboundedly. The following theorem proves that we can then nevertheless choose the dependence 
H(e) so that (PSJ holds: 

Theorem 4.1 Assume that K is a bounded operator from 7i to TC with \\K\\ < 1, that 1 < p < 2 and that the entries 
in the sequence w = (u> 7 ) 7g r ar £ bounded below uniformly by a strictly positive number c. Assume that either p > 1 or 
N(K) = {0}. For any g G TC and any \x > 0, define /« w p - g to be the minimizer of $^, w , p;9 (/). If fi = /i(e) satisfies 
the requirements 

lim /i(e) = and lim e 2 / We) = , (30) 

e—*0 e— »0 

then we have, for any f Q G H, 



lim 



sup 114%)™ -/ T ll 

llfl--K7o||<e 



= 



where is the unique element of minimum ||| || Wj p-norm in S = N(if) + f = {/; K f = Kf Q }. 

Note that under the conditions of Theorem l4.ll fi must indeed be unique: if p > 1, then the ||| || WjP -norm is strictly 
convex, so that there is a unique minimizer for this norm in the hyperspace N(if) + / ; if p = 1, our assumptions 
require N(if) = {0}. Note also that if ~N(K) = {0} (whether or not p = 1), then necessarily fi = f a . 

To prove Theorem 14.11 we will need the following two lemmas: 

Lemma 4.2 The functions S w . p from R to itself, defined by (|14l I16f) for p > 1, p — 1, respectively, satisfy 

\S WiP (x)-x\ < ^ \x\p- 1 . 
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Proof: For p = 1, the definition (|16fl implies immediately that \x — S w ±(x)\ — min(w/2,\x\) < u>/2, so that the 
proposition holds for x ^ 0. For x = 0, S w i(x) = 0. 

For p > 1, 5„, )P = (F^p) -1 , where F w . p (y) = y + ^ |y | p_1 sign(y) satisfies |Fu, jP (j/)| > |y|, and - J/| < 

^|tf| p - 1 . It follows that js^a;)! < |x| , and \x - S w , p (x)\ < » ^^(as)!*- 1 < ™ l^ 1 . | 



Lemma 4.3 If the sequence of vectors (vk) h converges weakly inTL tov, andlmik^oo ||k||| WlP = |u|w,p» then (vk) 
converges to v in the TL-norm, i.e. lim^oo \\v — Vk\\ = . 



Proof: It is a standard result that if w— lim^—^oo Vk — and lim^^oo ||^fc| — ll^lk then lim^—*^ |ju — ^fcll^ — 
limfc^oo (||u|| 2 + \\vk\\ 2 — 2 (v, Vk)) = \\v\\ 2 + \\v\\ 2 — 2 (v, v) — 0. We thus need to prove only that lim^oo \\vk\\ = \\v\\. 

Since the Vk converge weakly, they are uniformly bounded. It follows that the |ufc,-y| = | (vk,<p-y) | are bounded 
uniformly in k and 7 by some finite number C . Define r = 2/p. Since, for x, y > 0, \x r — y r \ < r\x — y\ max(a;, 
it follows that | k, 7 | 2 — |« 7 | 2 | < rC p ' r_1 ' | |v/c, 7 | p — kl P | ■ Because the w 7 are uniformly bounded below by c > 0, 
we obtain 

|KII 2 - \\v\\ 2 \ < ]T IK7I 2 - kl 2 | < f c 2 -^ »7 1 \vkM p - kH . 

so that it suffices to prove that this last expression tends to as k tends to 00. Define now Ufc j7 = min(|wfc )7 |, k|). 
Clearly V7 € T : lim^oo Ufc i7 = |u 7 | ; since X) 7 er w 7kl p < °°i if follows by the dominated convergence theorem 
that limfe^oo J2'yeT w i u k,-y = J2-yer w i\ v i\ P - Since 



5> 7 1 \ Vk „\* - KH = w 7 (kl p + k,7l p - 2<7 



7?r 7 er 

the Lemma follows. | 

We are now ready to proceed to the 
Proof of Theorem 4.1: 
Let's assume that /i(e) satisfies the requirements (jSOJ). 

We first establish weak convergence. For this it is sufficient to prove that if (g n )n£N is a sequence in H' such that 
\\g n — Kf Q \\ < e„, where (e n ) n gN is a sequence of strictly positive numbers that converges to zero as n — > 00, then 
w-limn^oo f*. — /T, where is the unique minimizer of < E , At , w ,p;g(/) (As predicted, we have dropped here 
the explicit indication of the dependence of /* on w and p; these parameters will keep fixed values throughout this 
proof. We will take the liberty to drop them in our notation for $ as well, when this is convenient.) For the sake of 
convenience, we abbreviate /i(e n ) as fi n . 

Then the f^ n 9n are uniformly bounded in H by the following argument: 

U f-i n C, fin C, 

= ^[||if/ _ 5n ||a + ^| / t||P^<i('^ + || / t|P ip > \ , (31) 

where we have used, respectively, the bound 1)21(1. the fact that ft-g n minimizes $ M „ :9 „ (/), Kf' = Kf a and the bound 
\\K f Q — g n \\ 2 < e 2 ■ By the assumption i|30[l . e 2 //i n tends to zero for n — > 00 and hence can be bounded by a constant 
independent of n. 

It follows that the sequence (/u n - ffn )„ eN has at least one weak accumulation point, i.e. there exists a subsequence 

. gn ) leN that has a weak limit. Because this sequence is bounded in the ||| ||-norm, by passing to a subsequence 

( fin -g n ) , we can ensure that the II /* ._ | W)P constitute a converging sequence. To simplify notation, we 

define ju^ = fJ, ni(k , and fk = ft - a i the /ft have the same weak limit / as the /* .„ . We also define cjk = 9nu k -, , 

efc =gk — Kfo and ?fe = e ti , (fc) ■ We shall show that / = /L 

Since each is the minimizer of ^ji k \gk(f)i by Proposition 13.1 01 it is a fixed point of the corresponding operator T. 
Therefore, for any 7 £ T, / 7 = ( f, ip^ ) satisfies 



/ 7 = lim (fk) 1 = lim Sji kW „, P [(/ife) 7 ] 

re — ^00 fe — »oo 

with X fc = f k + K* (g k -Kf k ) = f k + K*K(f Q - f k ) + if*e fe . 
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We now rewrite this as 



ft = lim [Sji hW p [(hk)-y\ ~ (hk)-y) + lim (h k ) 
By Lemma l4.2l thc first limit in the right hand side is zero, since 



(32) 



Sji kW „,p[(hk)-y] - (ftfe) 7 <P w^p k \{hk)~/\ p 1 /2 < p C p k [3C + e k }/2 >0 

k — >oo 



where we have used \\K || < 1 (C is some constant depending on Wj). Because lim^oo \\ek\ 
it then follows from 132(1 that 

ft = lim (h k ) 7 = ft + [K*K{ft - /)] 7 . 



0, and w-limfe^oo f k = f, 



k^c 



l|w,p 



> 



Since this holds for all 7, it follows that K*K(f^ — f) = 0. If N(A") = {0}, then this allows us immediately to conclude 
that / = ft. When N(K) 7^ {0}, we can only conclude that ft — / G N(K). Because ft has the smallest ||| || WjP -norm 

On the other hand, because the f k weakly 

(33) 



among all / € S = {f;Kf = Kf Q }, it follows that 

converge to /, and therefore, for all 7, (/fc) 7 — > ft as k — > 00, we can use Fatou's lemma to obtain 

= E^l^r < hmsup V W7 |(/ fc ) 7 r - Umsup||/ fc |^ p = lim ||/ fc IIID 



llw,p 



It then follows from (|31fl that 



lim 



< lim 



llw.p 



? 2 



1/ 



t«|P 



ft||P < 



Together, the inequalities l(33|) and ((34(1 imply that 

lim = |||/t| 



l|w,p 



(34) 



(35) 



Since ft is the unique element in S of minimal ||| | W)P — norm, it follows that f = ft. The same argument holds for 
any other weakly converging subsequence of (/u n ; 9n )neNj h follows that the sequence itself converges weakly to ft. 
Similarly we conclude from l(33|l that lim^oo |||/* ||| W)P = 1/1 •„ |||w,p ■ It then follows from Lemma FOI that the 
fu n ;g n converge to ft in the H-norm. | 



Remark 4.4 Even when p — 1 and N(K) ^ {0}, it may still be the case that, for any f a £ 7i, there is a unique 
element ft of minimal norm in S = {/ G Ti; K f — Kf }. (For instance, if K is diagonal in the </? 7 -basis, with some 
zero eigenvalues, then the unique minimizer ft in S is given by setting to zero all the components of f a corresponding 
to 7 for which Kcp 7 = 0.) In this case the proof still applies, and we still have norm-convergence of the f£r E ) w p - g to 
ft if n(e) satisfies © and ||g - Kf Q \\ < e 0. 

4.2 Stability estimates 

The regularization theorem of the previous subsection gives no information on the rate at which the regularized solution 
approaches the exact solution when the noise (as measured by e) decreases to zero. Such rates are not available in the 
general case, but can be derived under additional assumptions, discussed below. For the remainder of this section we 
shall assume that the operator K is invertible on its range, i.e. that N(if) = {0}. Suppose that the unknown exact 
solution of the problem, f Q , satisfies the constraint |/ |[w,p < P, where p > is given; in other words, we know a priori 
that the unknown solution lies in the ball around the origin with radius p in the Banach space Z? W!P ; we shall denote 
this ball by B WiP (0, p). If we also know that g lies within a distance e of Kf a in 7i', then we can localize the exact 
solution within the set 

T{e, p) = {/ e H; \\Kf - g\\ < e , |||/||| w , p < p} . 

The diameter of this set is a measure of the uncertainty of the solution for a given prior and a given noise level e. The 
maximum diameter of T, namely diam(jF)=sup{||/ — /'||; /, /' G J 7 } is bounded by 2M(e,p), where M(e,p), defined 
by 

M(e,p) =sup{||fc||; \\Kh\\ <e, |||/i||| w , p < p} , (36) 

is called the modulus of continuity of K^ 1 under the prior. (We have once more dropped the explicit reference in our 
notation to the dependence on w and p.) If (|2*5|l is satisfied, then the ball B WiP (0,p) is compact in Tt, and it follows 
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from a general topological lemma (see e.g. 120]) that M(e,p) — > when e — * ; the uncertainty on the solution thus 
vanishes in this limit. However, this topological argument, which holds for any regularization method enforcing the 
prior |||/ |||w.p < Pi does not tell us anything about the rate of convergence of any specific method. 

In what follows, we shall systematically assume that <|29[) is satisfied. We shall also make additional assumptions 
that will make it possible to derive more precise convergence results. Our specific regularization method consists in 
taking the minimizer /*.. of the functional $ Kg {f) given by (|27J) as an estimate of the exact solution f Q , where we 
leave any links between p and e unspecified for the moment. (Because of the compactness argument above, we could 
conceivably dispense with (|30|) : see below.) An upper bound on the reconstruction error — fo\\ > valid for all g 
such that \\g — K f D \\ < e, as well as uniformly in f a , is given by the following modulus of convergence: 

M A1 (e,p) = sup{||/; ;g -/||; / G H, 9 G W , \\Kf - g\\ < e , |/| WiP < p} . (37) 

The decay of this modulus of convergence as e — ► is governed by the decay of the modulus of continuity |J5SJ, as 
shown by the following proposition. 

Proposition 4.5 The modulus of convergence l|37|) satisfies 

M(e, p) < M M (e, p) < M(e + e',p + p') . (38) 

where 

e' = (e 2 + pp p ) ~ 2 and p' = Lp + t^j' . (39) 

and M(e,p) is defined by (I36|) . 

Proof: We first note that $ W9 (/£ 9 ) < $ Kg (fo) < <? + PP P because /* is the minimizer of < & M; g(/) and f Q G T{e,p). 
It follows that 

\\Kf;, g -g\\ 2 < W/^) < e 2 + p P p and p\\f;j^ p < < e 2 + H(T 

or, equivalently, /* G !F(e',p') with e' and p' given by The modulus of convergence (|3TJl can then be bounded 

as follows, using the triangle inequality. Indeed, for any / G T(t,p) and /' G !F(e',p'), we have \\K(f — f')\\ < e + e' 
and 1/ — /'|||w,p < p + p' ■ and we immediately obtain from the definition of l|3l)|l the upper bound in To derive 

the lower bound, observe that for the particular choice g = 0, the minimizer f*. g of the functional J23 is /£ ;0 = 0. 
The desired lower bound then follows immediately upon inspection of the two definitions l|36(l and (|37|l . | 



Let us briefly discuss the meaning of the previous proposition. The modulus of continuity M(e,p) yields the best 
possible convergence rate for any regularization method that enforces the error bound and the prior constraint defined 
by H36(l . Proposition 14.51 provides a relation between the modulus of continuity and the convergence rate M M (e,p) of 
the specific regularization method considered in this paper, which is defined by the minimization of the functional 
(E7jl . Optimizing the upper bound in (|3~5|l suggests the choice p = e 2 /p p , yielding e' = \[2 e and p' — 2 1 / p p . With 
these choices, we ensure that /* — > f a when e — > 0, i.e. that the problem is regularized, provided we can show that 
the modulus of continuity tends to zero with e. Moreover, once we establish its rate of decay (see below), we know that 
our regularization method is (nearly) optimal in the sense that the modulus of convergence l|37|l will decay at the same 
rate as the optimal rate given by the modulus of stability M(e, p) (We call it nearly optimal because, although the rate 
of decay is optimal, the constant multiplier probably is not.) Note that because of the assumption of compactness of 
the ball B WjP (0,p) (which amounts to assuming that l|29l) is satisfied), we achieve regularization even in some cases 
where e 2 /p does not tend to zero for e — > 0, which is a case not covered by Theorem 14.11 

In order to derive upper or lower bounds on M(e,p), we must know more information about the operator K. The 
following proposition illustrates how such information can be used. 

Proposition 4.6 Suppose that there exist sequences b = (6 7 ) 7 er and B = (£> 7 ) 7e r satisfying y '7 G T : < 6 7 , _B 7 < 
00 and such that, for all h in Ji, 

5>> 7 | 2 < ||^f <^S 7 |/i 7 | 2 . (40) 
7er 7 er 

Then the following upper and lower bounds hold for M(e, p): 

M{e,p) > 



M(e,p) < 



max 

7er 



min (pw-^P^B- 1 ' 2 



r=r i ur 2 I/ min-yeri & 7 min 7e r w 2/p 



(41) 
(42) 
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Proof: To prove the lower bound, we need only exhibit one particular h such that H-KTiH < e and |||/i||| w .p < p^ for which 
\\h\ is given by the right hand side of l|41l) . For this we need only identify the index r y m such that, V7 € T , 

v = min (pw- 1 ^, eS-y 2 ) > min eB^ 1 ' 2 ) , 



and choose h — vLp 1 . Then — v\i£J v < p and \\Kh\\ < vBl/2 < £ ; on the other hand, v equals the right hand 

side of igTJ. 

On the other hand, for any partition of T into two subsets, r = I\ LIE,, and any h 6 {h; \\Ku\\ < e, ||M| w ,p < p}, 
we have 



En 2 = Ei^i 2 + E^ 



2 

71 



< max^, 1 ] E M^ 7 | 2 + mj^K' Hmax vIVP]" 1 E ^I^tT 
7 1 7ei\ 7 2 7 2 7 er 2 

< max \b~}]e 2 + max ko~ 2/p l p 2 . 
~ 7'eiy t 7 'er 2 L T 

Since this is true for any partition T — T ± U T 2 , we still have an upper bound, uniformly valid for all h S {u; \\Ku\\ < 
e i ll M l|w,p < p}, if we take the minimum over all such partitions. The upper bound on M(e, p) then follows upon 
taking the square root. | 

To illustrate how Proposition 14.61 could be used, let us apply it to one particular example, in which we choose the 
(c/? 7 )-basis with respect to which the ||| | WjP -norm is defined to be a wavelet basis (^>\)\eA- As already pointed out in 
subsection 1.4.1 the Besov spaces B^ p (M. d ) can then be identified with the Banach spaces 2? WiP for the particular choice 

w\ = 2 CTp l A l, where a = s + d(j^ — j^J is assumed to be non-negative. For / G Bp p (M. d ), the Banach norm |/| w ,p then 

coincides with the Besov norm |/| s ,p = [Saga w >- I (/i^a) | p ] ^ P . Let us now consider an inverse problem for the 
operator K with such a Besov prior. If we assume that the operator K has particular smoothing properties, then we 
can use these to derive bounds on the corresponding modulus of continuity, and thus also on the rate of convergence 
for our regularization algorithm. In particular, let us assume that the operator if is a smoothing operator of order a, 
a property which can be formulated as an equivalence between the norm || K and the norm of h in a Sobolev space 
of negative order H~ a , i.e. in a Besov space B^2 (see e.g. [20], HZ]; |12| . [5]). In other words, we assume that for 
some a > 0, there exist constants Ai and A u , such that, for all h £ L 2 (S. d ), 

A 2 E2" 2|A|Q \hx\ 2 < \\Kh\\ 2 < A 2 E2~ 2|A|Q \h x \ 2 • (43) 

A A 

The decay rate of the modulus of continuity is then characterized as follows. 

Proposition 4.7 // the operator K satisfies the smoothness condition l|43(l . then the modulus of continuity Al(e,p), 
defined by 

M(e,p) = max{\\h\\; \\Kh\\ < e, < p} , 

satisfies 

c(£T^^p)<^(£)%^ (44) 

where a = s + d — ^) > 0, and c and C are constants depending on a and a only. 

Proof: By g3J), the operator K satisfies JUI with b x = A 2 2- 2 l A l Q and B\ = A 2 U 2~ 2 I A I Q . 
It then follows from J2IJ that 



M(e, p) > max 



mm [ p2- a ^,-?-2 a \ x 
A u 



if x — |A| could take on all positive real values, then one easily computes that this max- min would be given for 
x = — [log 2 (e/ pA u )]/ (a + a), and would be equal to (e/A u )' T ^ a+ ' T ^ p a ^ a+cr \ Because |A| is constrained to take only 
the values in N, the max-min is guaranteed only to be within a constant of this bound (corresponding to an integer 
neighbor of the optimal x), which leads to the lower bound in 1)440. 
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For the upper bound (|42[l. we must partition the index set. Splitting A into A 1 — {A; | A| < J} and A 2 = {A; | A| > J}, 
we find that 



„2 Jl 



min AeAl b x min Ae A, A ? 



P _ 6 9 2a(J-l) , 2 9 -2<x,7 

2/r> ~~ A2 Z ~T p & 



The minimizing partition for A thus corresponds with the minimizing J for the right hand side of this expression. 
This value for J is an integer neighbor of y = — [\og 2 (e/ pAi)]/ (a + a), which leads to the upper bound in 1|44J) . | 



The stability estimates we have derived are standard in regularization theory for the special case p = 2; they were 
first extended to the case 1 < p < 2 in JUj. They show the interplay between the smoothing order of the operator 
characterized by a and the assumed smoothness of the solutions characterized by a = s + — i) (for Besov spaces, 
we recall that this amounts to solutions having s derivatives in L p ). For a j(a + a) close to one, the problem is mildly 
ill-posed, whereas the stability degrades for large a. Note that if the bound 143(1 were replaced by another one, in 
which the decay of the 6a and B\ was given by an exponential decay in D = 2' A (instead of the much slower decaying 
negative power D~ 2a ) of l|43l) . then the modulus of continuity would tend to zero only as an inverse power of | loge|. 
This is the so-called logarithmic continuity which has been extensively discussed in the case p = 2, and which extends, 
as shown by an easy application of Proposition 14.61 to 1 < p < 2. 



5 An illustration 

We provide a simple illustration of the behavior of the algorithms based on minimizing $^ WOl i and $^,2 for a 
two-dimensional deconvolution problem, considering a class of objects consisting of small bright sources on a dark 
background. The image is discretized on a 256 x 256 array, denoted by /. The convolution operator K is implemented 
by multiplying the discrete Fourier transform (DFT) of / by a low-pass, radially symmetrical filter and then taking 
the inverse DFT to obtain the data g = Kf (data were zero padded on a larger 512 x 512 array when taking DFTs). 
The filter was equal (in the Fourier domain) to the convolution with itself of the characteristic function of a disk 
with radius equal to 0.1 times the maximum frequency determined by the image grid sampling; this filter provides a 
discrete model of a diffraction-limited imaging system with incoherent light. Pseudo-random Poisson noise was added 
to the data array g, for a total number of 10000 photons, corresponding to about 25 photons for the data pixel with 
the maximum intensity. 

The top of Figure 1 shows the object / (four ellipses of axis 7.5 or 5.0 pixels, slightly smoothed to avoid blocking 
effects) and the data g — Kf. The figure also shows intensity distributions along two lines in the object and data 
arrays; along the horizontal line we see how two close sources in / give rise to a joint blur in g. The bottom of 
Figure 1 shows the reconstructions obtained after 2000 steps of the iterative thresholded Landweber algorithm, which 
accurately approximate the minimizers of ^^wc^i an d ^^ 2 vo,2- The parameters p\ and pi are selected separately for 
each case, in order to achieve a good balance between sharpness and ringing and noise. 

As expected for an example of this type, the minimizer of $ MlWo ,i does a better job at resolving the two close 
sources on the horizontal line; it also gives a better concentration of the source lying along the vertical line. Because 
the object / is positive, we can apply Remark 3.12 and use PcS^ ^ p instead of S^wo.p, where Pc is the projection 
onto the convex cone of 256 x 256 arrays that take only non-negative values. 

The results are shown in Figure 2; on the top is the 2000-th iterate for the case with Pc, with the case p = 1 (left) 
and p = 2 (right). In each case we used the same values of p\ and pi as for the reconstructions without positivity 
constraint, which are shown for comparison on the bottom of Figure 2. Exploiting the positivity constraint leads to 
better resolution and less ringing for this example, where the background is zero. 

Figure 3 gives a different view of the same solutions, with a compressed gray scale ranging from —2% (darker) to 
+2% (lighter) of the maximum intensity in the original object. This has the effect of highlighting the ringing effects 
and the noise. Both ringing and noise are seen to be less pronounced for the minimizer of ^^wo.i (top left) than that 
of $)i 2Wl .2. Although the introduction of the positivity constraint removes the ringing phenomenon (top of Figure 3), 
we nevertheless see that noise is better suppressed with p = 1 . 

To produce Figures 1 to 3, the same program was used in every case; the only change was the choice of the nonlinear 
operator applied at the n-th iteration step to + K*(g — Kf n ^ 1 ). For realistic applications on data of this type, 
more sophisticated algorithms exist. With the £ 2 -penalty, for instance, the reconstructions in our simple example can 
be obtained directly by a regularized Fourier deconvolution. These examples are included to illustrate the differences 
that can be achieved by the choice of p, and do not constitute a claim that the iterative algorithms discussed in this 
paper are optimal. The "data" in this example are also only simple-minded caricatures of quasi point-sources data 
sets. While similar examples may have applications in astronomy, most natural images have a much richer structure. 
However, as is abundantly documented, the wavelet transforms of natural images tend to have distributions that 
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Figure 1: The object / (top left), the image g after convolving with a radially symmetrical low-pass filter and adding 
pseudo-random Poisson noise (top right), and the minimizers of <I> Ml wo,i (bottom left) and $,, 2W0j 2 (bottom right). 
The values \x\ — 0.001 and \ii = 0.0001 have been selected separately for the i 1 - and ^ 2 -cases, to obtain a balance 
between sharpness and ringing and noise. 



are sparse. A similar improvement in accuracy can be expected by applying i 1 rather than £ 2 penalizations on the 
wavelet coefficients in inverse problems involving natural images, similar to the gain achieved in denoising with a soft 
thresholding rather than with a quadratic penalty. 

6 Generalizations and additional comments 

The algorithm proposed in this paper can be generalized in several directions, some of which we list here, with brief 
comments. 

The penalization functionals |||/||| w .p we have used arc symmetric, i.e. they are invariant under the exchange of 
/ for — /. We can equally well consider penalization functionals that treat positive and negative values of the / 7 
differently. If (w+) 7e r and (w~) 7e r are two sequences of strictly positive numbers, then we can consider the problem 
of minimizing the functional 

*w + ,w-, P (/) = \\Kf g\\ 2 + £(«)[/ 7 ft + K)[/ 7 D (45) 

where, for r € R, r+ = max(0, r), r_ = max(0, — r). One easily checks that all the arguments in this paper can be 
applied equally well (after some straightforward modifications) to the general functional (|45|) , provided we replace the 
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Figure 2: A comparison to illustrate the impact of the positivity constraint, imposed at every iteration step. On the 
left are the fixed points for -PcS^ lW o,i (top) and S^ lWo i (bottom); on the right those of -Pc^V 2 w ,2 (top) and S^ 2Wo 2 
(bottom). The data and the values of H\, 1x2 are the same as in Figure 1. 



thresholding functions S Wi . p in the iterative algorithm by S w + w - p , where, for p > 1, 

S w + ,w~, P = ( F w+,w-, P ) 1 wi th F w+tW - iP (x) = x + I w+lx} 1 ^ 1 - | w^lxfS 1 , 

and for p = 1, 

{x + w~/2 if x < -w~/2 
if -W-/2 < x < w+/2 

x-w + /2 if x>w+/2. 

The above applies when the / 7 are all real; a generalization to complex / 7 is not straightforward. When dealing with 
complex functions, one could generalize the penalization X) 7 er w "/\fi\ P t° S 7 er \f \=£o (arg/ 7 ) | /-y | p , where the weight 
coefficients have been replaced by strictly positive 27r-periodic C^-functions on the 1-torus T = {x G C, \x\ = 1}. It 
turns out, however, that the variational equation for e larg ^"' = f-ylf-yl^ 1 then no longer uncouples from that for |/ 7 | 
(as it does in the case where u> 7 is a constant), leading to a more complicated "generalized thresholding" operation 
in which the absolute value and phase of the complex number S w ^ p (f^) are given by a system of coupled nonlinear 
equations. 

When the (</? 7 ) 76 r - basis is chosen to be a wavelet basis, then we saw in subsection 1.4.1 that is is possible to make 
the II || WiP -norm equivalent to the Besov-norm |[ | S;P , by choosing the weight for |(/, ^\)\ p to be given by w\ = 2l A l <Tp , 
where |A| is the scale of wavelet The label A contains much more information than just the scale, however, since 
it also indicates the location of the wavelet, as well as its "species" (i.e. exactly which combination of 1-dimensional 
scaling functions and wavelets is used to construct the product function ^a)- One could choose the w\ so that certain 
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Figure 3: A different view of the four solutions in Figure 2, with a different dynamic range for the image intensity 
gray scale, to highlight ringing and other artifacts. 



regions in space are given extra weight, or on the contrary de-emphasized, depending on prior information. In pixel 
space, prior information on the support of the object to be reconstructed can be easily enforced by simply setting 
the corresponding weights to very small values, or by choosing very large weights outside the object support. This 
type of constraint is of uttermost importance to achieve superresolution in inverse problems in optics and imaging (see 
e.g. When thresholding in the wavelet domain, a constraint on the object support can be enforced in a similar 

way due to the good spatial localization of the wavelets. If no a priori information is known, one could even imagine 
repeating the wavelet thresholding algorithm several times, adapting the weights w\ after each pass, depending on 
the results of the previous pass; this could be used, e.g., to emphasize certain locations at fine scales if coarser scale 
coefficients indicate the possible existence of an edge. The results of this paper guarantee that each pass will converge. 

In this paper we have restricted ourselves to penalty functions that are weighted £ p -norms of the / 7 = (/, <^ 7 ). 
The approach can be extended naturally to include penalty functions that can be written as sums, over 7 € T, of more 
general functions of / 7 , so that the functional to be minimized is then written as 



* w Cf) = 11*7 



7er 



The arguments in this paper will still be applicable to this more general case if the functions W 7 : R+ — * R+ are convex, 
and satisfy some extra technical conditions, which ensure that the corresponding generalized component-shrinkage 
functions S-y are still non-expansive (used in several places), and that, for some c > 0, 



inf 

Ml<i 



inf I 

llolKc 



- 2 E 

7er 



u 7 + S* 7 (a 7 ) 



5* 7 (^ 7 ~\~ ^ 7 



> 
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(used in Lemma f3. 1811 . To ensure that both conditions are satisfied, it is sufficient to choose functions W 1 that are 
convex, with a minimum at and e.g. twice diffcrcntiable, except possibly at (where they should nevertheless still 
be left and right differentiable), and for which W" > 1 on V \ {0}, where V is a neighborhood of 0. 

We conclude this paper with some comments concerning the numerical complexity of the algorithm. 

At each iteration step, we must compute the action of the operator K*K on the current object estimate, expressed 
in the <^ 7 -basis. In a finite-dimensional setting where the solution is represented by a vector of length N, this 
necessitates in principle a matrix multiplication of complexity 0(N 2 ), if we neglect the cost of the shrinkage operation 
in each iteration step. After sufficient accuracy is attained and the iterations are stopped, the resulting (/™) 7 must 
be transformed back into the standard representation domain of the object function, except in the special case where 
the ip 7 are already the basis for the standard representation (e.g., if the y> 7 correspond to the pixel representation for 
images). This adds one final 0(A^ 2 )-matrix multiplication. In this scenario, the total cost equals that of the classical 
Landweber algorithm on the basis of a comparable number of iterations. Since Landweber's algorithm typically 
requires a substantial number of iterations, it follows that this method can become computationally competitive with 
the 0(N 3 ) SVD algorithms only when N is large compared to the number of iterations necessary. 

Several methods have been proposed in the literature to accelerate the convergence of Landweber's iteration, which 
could be used for the present algorithm as well. For instance, one could use some form of preconditioning (using the 
operator D of the Remark 12.4(1 or group together k Landweber iteration steps and apply thresholding only every k 
steps (see e.g. the book |2()|^). 

Much more substantial gains can be obtained when the operator K*K can be implemented via fast algorithms. In 
a first important class of applications, the matrix 

((K* K(pj, ipj'}) 7 , gr is sparse; if, for instance, there are only O(N) non vanishing entries in this matrix, then standard 
techniques to deal with the action of sparse matrices will reduce the cost of each iteration step to O(N) instead of 
0(N 2 ). If the ^5 7 -basis is a wavelet basis, this is the case for a large class of integro-differential operators of interest 
(see e.g. 0]). Even if K*K is sparse in the </? 7 -basis, but has an even simpler expression in another basis, and if 
the transforms back and forth between the two bases can be carried out via fast algorithms, then it may be useful to 
implement the action of K* K via these back-and-forth transformations. For instance, if the object is of a type that 
will have a sparse representation in a wavelet basis, and the operator K* K is a convolution operator, then we can 
pick the i/? 7 -basis to be a wavelet basis, and implement the operator K* K by doing, successively, a fast reconstruction 
from wavelet coefficients, followed by a FFT, a multiplication in the Fourier domain, an inverse FFT, and a wavelet 
transform, for a total complexity of 0(N log N). One can obtain similar complexity estimates if the algorithm is 
modified to not only take the nonlinear thresholding into account, but also additional projections Pc on a convex set, 
such as the cone of functions that are a.e. positive; in this case, after the thresholding operation, one needs to carry 
out an additional fast reconstruction from, say, the wavelet domain, take the positive part, and then perform the fast 
transform back, without affecting the 0(N log AT) complexity estimate. 

The situations described above cover several applications of great practical relevance, in which we expect this 
algorithm will prove itself to be an attractive competitor to other fast techniques for large-scale inverse problems with 
sparsity constraints. 
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Appendices 

A Wavelets and Besov spaces 

We give a brief review of basic definitions of wavelets and their connection with Besov spaces. This will be a sketch 
only; for details, we direct the reader to e.g. [2911121 151 128 j . 

For simplicity we start with dimension 1. Starting from a (very special) function ip we define 



and we assume that the collection {ipj t k',j, k £ Z} constitutes an orthonormal basis of L 2 (R). For all wavelet bases 
used in practical applications, there also exists an associated scaling function <j>, which is orthogonal to its translates 
by integers, and such that, for all j € Z, 



where the (f>j.k are defined analogously to the ipj,k- Typically, the functions (f> and ip are very well localized, in the 
sense that VN € N, J R (1 + |x|) Ar (|^(x)| + \ip(x)\)dx < oo; one can even choose (j) and such that they are supported 
on a finite interval. This can be achieved with arbitrary finite smoothness, i.e. for any preassigned L € N, one can find 
such (j) and ip that are moreover in C L (R). Because of one can consider (inhomogeneous) wavelet expansions, in 
which not all scales j are used, but a cut-off is introduced at some coarsest scale, often set at j = 0. More precisely, 
we shall use the following wavelet expansion of / 6 L 2 , 



Wavelet bases in higher dimensions can be built by taking appropriate products of one-dimensional wavelet and scaling 
functions. Such d-dimensional bases can be viewed as the result of translating (by elements k of Z d ) and dilating (by 
integer powers j of 2) of not just one, but several (finite in number) "mother wavelets" , typically numbered from 1 to 
2 d — 1. It will be convenient to abbreviate the full label (including j, k and the number of the mother wavelet) to just 
A, with the convention that |A| = j. We shall again cut off at some coarsest scale, and we shall follow the convenient 
slight abuse of notation used in jS] that sweeps up the coarsest- j scaling functions (as in l|47l) ') into the 4" a as well. We 
thus denote the complete d-dimensional, inhomogeneous wavelet basis by {^\; A E A}. 

It turns out that {^f\; A <E A} is not only an orthonormal basis for L 2 (M. d ), but also an unconditional basis for a 
variety of other useful Banach spaces of functions, such as Holder spaces, Sobolev spaces and, more generally, Besov 
spaces. Again, we review only some basic facts; a full study can be found in e.g. |29U12l l%], The Besov spaces 9 (K.' 1 ) 
consist, basically, of functions that "have s derivatives in L p " ; the parameter q provides some additional fine-tuning 
to the definition of these spaces. The norm ||/|| Ba in a Besov space B^ q is traditionally defined via the modulus of 
continuity of / in L P (R), of which an additional weighted L^-norm is then taken, in which the integral is over different 
scales. We shall not give its details here; for our purposes it suffices that this traditional Besov norm is equivalent with 
a norm that can be computed from wavelet coefficients. More precisely, let us assume that the original 1-dimcnsional 
4> and ip arc in C L (R), with L > s, that a = s + d(^ — -) > 0, and define the norm || • | s p q by 



^ k {x) = V' 2 i>(2ix-k) ,j,keZ, 



Sp&n{<j)j,k;k G Z} Span{^ i / C ; k £ 



Z} = Span{0 J+ i ifc ; k e Z} , 



(46) 




(47) 




(48) 
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Then this norm is equivalent to the traditional Besov norm, |/| s;p . q ~ ||/|| ss j that is, there exist strictly positive 
constants A and B such that 



the Besov spaces. A particularly convenient choice, to which we shall stick in the remainder of this paper, is q = p, 
for which the expression simplifies to 



to alleviate notation, we shall drop the extra index q wherever it normally occurs, on the understanding that q = p 
when we do so. 

When < p, q < f, the Besov spaces can still be defined as complete metric spaces, although they are no 
longer Banach spaces (because Q48JI no longer is a norm), This allows for more local variability in local smoothness 
than is typical for functions in the usual Holder or Sobolev spaces. For instance, a real function / on I that is 
piecewise continuous, but for which each piece is locally in C s , can be an element of B*(R), despite the possibility of 
discontinuities at the transition from one piece to the next, provided p > is sufficiently small, and some technical 
conditions are met on the number and size of the discontinuities, and on the decay at oo of /. 

Wavelet bases are thus closely linked to a rich class of smoothness spaces; they also provide a good tool for high 
accuracy nonlinear approximation of a wide variety of functions. For instance, if the bounded function / on [0, 1] 
has only finitely many discontinuities, and is C s elsewhere, then one can find a way of renumbering (dependent on 
/ itself) the wavelets in the standard wavelet expansion of /, so that the distance in, say, L 2 ([0, 1]) between / and 
the first N terms of this reordered wavelet expansion, decreases proportionally to N~ s . If s is large, it follows that 
a very accurate approximation to / can be obtained with relatively few wavelets; this is possible because the smooth 
patches of the piecewise continuous / will be well approximated by coarse scale wavelets, which are few in number; 
to capture the behavior of / near the discontinuities much more localized finer scale wavelets are required, but only 
those wavelets located exactly near the discontinuities will be needed, which amounts again to a small number. 

In higher dimensions, d > 1, the suitability of wavelets is influenced by the dimension of the manifolds on which 
singularities occur. If the singularities in the functions of interest are solely point singularities, then expansions using 
N wavelets can still approximate such functions with distances that decrease like N~ 3 , depending on their behavior 
away from the singularities. If, however, we are interested in / that may have, e.g. discontinuities along manifolds 
of dimension higher than 0, then such wavelet approximations are not optimal. For instance, if / : IR 2 — > K is 
piecewise C L , with possible jumps across the boundaries of the smoothness domains, which are themselves smooth 
(say, C L again) curves, then TV-term wavelet approximations to / cannot achieve an error rate decay faster than 
N^ 1 / 2 , regardless of the value of L > 1. 

It follows that whenever we are faced with an inverse problem that needs regularization, in which the objects 
to be restored are expected to be mostly smooth, with very localized lower dimensional areas of singularities, we 
can expect that their expansions into wavelets will be sparse. This sparsity can be expressed by requiring that the 
wavelet coefficients (possibly with some scale-dependent weight) have a finite (or small) ^ p -norm, with 1 < p < 2, 
or equivalently that the Besov-equivalent norm |/| s is finite (or small), where |/| s is exactly of the form $ w ,p 
defined in J2J. 

B A fixed-point theorem 

We provide here the proof of the theorem needed to establish the weak convergence of the iterative algorithm. The 
theorem is given in [2]; we give a simplified proof here (see the remark at the end), which nevertheless still follows 
the main lines of Opial's paper. 

Theorem B.l Let C be a closed convex subset of the Hilbert space Ti and let the mapping A : C —* C satisfy the 
following conditions: 

(i) A is non- expansive: ||Au — Ai/|| < \\v — Vu,u' € C , 

(ii) A is asymptotically regular: \\A. n+1 v — A"i>|| > 0, \fv G C , 




II/II 




(iii) the set T of the fixed points of A in C is not empty . 
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Then, Vw <E C, the sequence (A n v) n ^m converges weakly to a fixed point in T . 

The proof of the main theorem will follow from a series of lemmas. As before, we use the notation w-lim to indicate 
a weak limit. 

Lemma B.2 If u,v G Ti, and if (v n )neN is a sequence in TL such that w-Yivan^^ v n — v, and u ^ v, then 

IK - u|| ■ 



Proof: We have liminfn^oo \\v n — u\\ 2 = liminf„^oo |jw n — 7;|| 2 + ||ii — m|| 2 + 2 lim n ^oo Re(v n — v, v—u) = lim inf 
i>|| 2 + II u — 1 1 2 , whence the result. 



n — >oc u n 



Lemma B.3 Suppose that A : C — > C satisfies condition (i) in Theorem \B.l\ 

If w-limn^oo u n = u, and limn^oo \\u n — Au n — h\\ = , then h — u — Au . 

Proof: Because of the non-expansivity of A (assumption (i)), we have — (h+ Au)\\ < ||u„ — h— Au n \\ +||Au„ — Aw|j 
< \\u n - h - Au n \\ +\\u n - u\\ . Hence, 

lim inf ||u n — (h + Au)|| < lim \\h — (u n — Au n )|| + lim inf \\u n — u\\ 

n — >oo n — >oo n — >oo 

= lim inf ||u„ — u\\ 

n — >oo 

It then follows from Lemma fB. 21 that u = h + Au or h = u — Au. | 

Lemma B.4 Suppose that A : C — > C satisfies conditions (i) and (ii) in Theorem XB . 1\ If a subsequence o/(A n u) ne n, 
with v £ C, converges weakly in C, then its limit is in T . 

Proof: Suppose w-limfc^oo A" k v = u . Since, by the assumption (ii) of asymptotic regularity, limn.+oo ||A n u — 
AA"w|| = , we have lim^oo ||A n *i> — AA" fc w|| = . By Lemma fB. 31 it follows that u — Au = , i.e. that u is in 
T. U 



Lemma B.5 Suppose that A : C — ► C satisfies conditions (i) and (hi) in Theorem \B. 1\ Then, for all h £ T ' , and all 

v € C, the sequence (\\A n v — /i||)neN * s non-increasing and thus has a limit. 

Proof: Since A is non-expansive, we have indeed || A n+1 v — h\\ = \\ AA n v — Ah\\ < \\A n v — h\\ . | 
We can now proceed to the 



Proof of Theorem IB7T1 

Let v be any element in C. Take an arbitrary h £ T . By Lemma IB. 51 we then have 
limsup n ^ 00 ||A n «|| < limsup^oo ||A n « - ft|| +||fc|| = ||h|| + lim n ^ 00 \\A n v-h\\ < oo . 

Since the ||A n u|| are thus uniformly bounded, it follows from the Banach-Alaoglu theorem that they must have at 
least one weak accumulation point. 

The following argument shows that this accumulation point is unique. Suppose we have two different accumulation 
points : w-liuik^^ A nk v = u, and wj-lim^oo A ne v = u , with u =/= u. 

By Lemma lB.4l u and ii must both lie in T ', and by Lemma IB. 51 the limits lim,^^ || A n v — u\\ and linin^oo || A n v — u\\ 
both exist. 

Since u =^ u, we obtain from Lemma IB. 21 that liminf^oo ||A™ fc u — u\\ > liminffc^ooH A nk v — u\\ . On the other hand, 
because (|j A nk v — u||)fceN and (|| A nk v — u\\)keN are each a subsequence of a convergent sequence, liminffe^oo || A nk v — u\\ 
= lim n _ +00 \\A n v — w|| and liminffc-^oo ||A nfc t> — u|| = lim^oo \\A nk v — u\\ . It follows that lim n ^oo ||A"u — u\\ > 
limj^oo ||A"i> — u\\ . In a completely analogous way (working with the subsequence A n 'v instead of A nk v) one derives 
the opposite strict inequality. Since both cannot be valid simultaneously, the assumption of the existence of two 
different weak accumulation points for (A n w) ne N is false. 

It thus follows that A n v converges weakly to this unique weak accumulation point. g 
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Remark B.6 It is essential to require that the set T is not empty since there are asymptotically regular, non-expansive 
maps that possess no fixed point. However, the only place where we used this assumption was in showing that the 
|| A"w|| were bounded. If one can prove this boundedness by some other means (e.g. by a variational principle as we 
did in the iterative algorithm), then we automatically have a weakly convergent subsequence (|| A™ fc w||)fc e N, and thus, 
by Lemma [B. 41 an element of T . 

Remark B.7 The simplification of the original argument of |31| (obtained through deriving the contradiction in the 
proof of Theorem lB.l|) avoids having to appeal to the convexity of T (which is true but not immediately obvious) and 
having to introduce the auxiliary sets used in |31j . 
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